Workbook homework due on Thursday
Pilot survey is in the works, expect to see a link next week. Then you’ll need to take the survey yourself and provide feedback on the questions.
Begin looking at relationships between variables
Theoretical Arguments and Hypotheses
Elements of a good theory
Writing good hypotheses
Testing Hypotheses
(material here comes mostly from Chapter 3 of your textbook)
Describing: making generalization about the world
Predicting: generating expectations about what will happen in the future
Explaining: explaining why things are related.
Explanation is often the toughest to achieve, but also the most desirable because it allows us to do things like make changes to reach a desired outcome.
“Why questions”
“a logically interconnected set of propositions from which empirical uniformities can be derived” – Robert K Merton
Theories are explanations, assumptions, claims and narratives that provide a set of expectations that link a cause to an effect.
Purely descriptive or predictive analyses don’t necessarily require a theory, but its a key component of explanatory research.
Theories vary in their scope:
What explains differences in the “path to modernity” across different countries during the 20th century?
Free markets/Democracy in the U.S. and England
Fascism in Germany and Japan
Communism in Russia and China
Barrington Moore: Classes have unique and conflicting interests. Conflicts over these interests come to the forefront during industrialization. The outcomes of these class conflicts shape the political and economic system.
Fascist states emerge when the landed aristocracy wins
Communist states emerge when the peasant class wins
Democracies emerge when the bourgeois (middle class) wins.
Moore’s theory borrows assumptions from a (sociological) Marxist grand theory about class conflict
He “tests” it by showing how it fits the selected cases.
It can be used and refined to generate a set of empirical expectations about what factors should matter for democratization. For instance, we might expect:
States with larger agricultural sectors during industrialization to be less democratic today (compared to states with smaller agricultural sectors)
States with higher literacy rates during industrialization to be more democratic (compared to states with lower literacy rates)
Good theories clearly identify:
A dependent variable(s): the outcome to be explained
One or more independent variables: the causal factors that determine the DV.
A causal mechanism that links these two things.
An expectation about the direction of the effect (positive, negative, something more complex)
Most social science theories are probabilistic instead of deterministic. So we’ll speak in terms of more/less likely or higher/lower.
Good theories should generate expectations that can be empirically tested (even if the theory itself can’t be tested)
What explains inconsistent answers to survey questions?
Asking the same people the same questions a few months apart yields surprisingly unpredictable results.
Small changes in question wording, ordering of choices, or survey context cause big changes in outcomes
Data from the 1980 ANES panel survey (reproduced from Zaller 1992)
Recieve-Accept-Sample model (Zaller)
Receive: People hear persuasive messages
Accept: They accept some of these and reject others depending on their predispositions.
Sample: When they answer a survey, they “sample” from the top-of-mind considerations.
outcome: attitude stability, attitudes
causes: the volume and clarity of messages (especially from elites)
(some) Expectations:
More engaged people are more persuadable when partisan cues are low
People with low engagement/knowledge will develop more consistent answers issues are highly salient
Why do people vote?
Since politicians generally offer public goods, you can enjoy the benefits of your preferred candidate winning even if you don’t vote
Since voting has costs (even though they’re small) free riding can be preferable to actually turning out if the costs outweigh the benefits.
Pivotal voting
Claim: people vote because they expect to sway the election
If this is true, then:
Turnout should be higher in close elections
Turnout should be higher when the electorate is small
Turnout will be higher in PR systems where one vote matters more.
Expressive voting model
Claim: people vote to enjoy the expressive benefits
If this is true then:
People with more extreme beliefs will be more likely to vote
Closeness or the size of the electorate shouldn’t matter much
Its unlikely that either theory is entirely true, but both can be used to generate expectations and productive debate over the relative weight of the evidence favoring one explanation vs. another.
Theories give causal explanations for why something effects something else
Hypotheses are specific testable implications generated by that theory.
Theory: people vote because of expressive benefits
Hypothesis: people with more extreme views will be more likely to turn out.
Components:
Unit of analysis
Dependent variable
Independent variable
Direction of the predicted relationship
Good hypotheses inevitably involve comparative language (higher/lower/more/less/increase/decrease/better/worse)
In a comparison of [unit of analysis], those having [one value on the independent variable] will be [more/less] likely to have [one value on the dependent variable] than those having a [different value on the independent variable].
In a comparison of [voters], those having [stronger political views] will be more likely to have [a higher likelihood of turnout] than those having a [weaker views].
Unit of analysis: voters
IV: strength of political views
DV: turnout
Relationship: strength increases turnout
In a comparison of [states], those having [a larger middle class during industrialization] will be more likely to have [democracy] than those having a [a smaller middle class].
Unit of analysis: states
IV: size of the middle class
DV: democracy
Relationship: middle class increases likelihood of democracy
In a comparison of [survey respondents], those having [higher levels of attention to politics] will be more likely to have [consistent responses] than those having a [lower levels of attention to politics].
Unit of analysis: Survey respondents
IV: level of attention
DV: response consistency
Relationship: attention increases consistency
Good hypotheses may suggest a more complex set of relationships than just “positive/negative”. They could propose conditional/interactive/curvilinear relationships as well.
The “oil curse”
In a comparison of [countries], those having [higher levels of GDP] will be [more likely to be democratic] compared to [countries with lower GDP], [however, this relationship will not hold for countries that get rich from oil exports.]
Retrospective voting:
In a comparison of [voters], those having [lower levels of attention to politics] will be [more likely to vote for the incumbent when the economy is doing well]. Those having [higher levels of attention to politics] will be [more likely to vote based on policy preferences regardless of the state of the economy]
The main determinant of war is the distribution of power in the international system.
In comparing individuals, annual income and the level of education are related.
Democracies are peaceful. In comparing individuals, some people are more likely to favor the death penalty than others.
Testing hypotheses by making comparisons
Graphing and describing relationships
Assuming we have a categorical independent variable (IV) and a categorical dependent variable (DV):
| iv | dv |
|---|---|
| HIGH | No |
| HIGH | No |
| LOW | No |
| HIGH | Yes |
| HIGH | Yes |
| HIGH | Yes |
| HIGH | Yes |
| LOW | Yes |
| LOW | Yes |
| LOW | Yes |
Start by calculating the number of observations with each value of each category:
| iv | dv |
|---|---|
| HIGH | No |
| HIGH | No |
| LOW | No |
| HIGH | Yes |
| HIGH | Yes |
| HIGH | Yes |
| HIGH | Yes |
| LOW | Yes |
| LOW | Yes |
| LOW | Yes |
iv
|
||
|---|---|---|
| dv | LOW | HIGH |
| No | 1 | 2 |
| Yes | 3 | 4 |
| Total | 4 | 6 |
Then, calculate the proportion/percentage of observations among each value of the IV.
If the independent variable is in the columns, then the columns should sum to 100%.
If the independent variable is in the rows, then the rows should sum to 100%.
iv
|
||
|---|---|---|
| dv | LOW | HIGH |
| No | 1 | 2 |
| Yes | 3 | 4 |
| Total | 4 | 6 |
iv
|
||
|---|---|---|
| dv | LOW | HIGH |
| No | 1 (25%) | 2 (33%) |
| Yes | 3 (75%) | 4 (67%) |
| Total | 4 | 6 |
Look at what happens to the DV at different values of the IV. If your variables are ordinal, you should be able to identify a direction of the effect.
The proportion of “Yes” values decreases as the IV goes from lower to higher, so this is a negative or inverse relationship.
iv
|
||
|---|---|---|
| dv | LOW | HIGH |
| No | 1 (25%) | 2 (33%) |
| Yes | 3 (75%) | 4 (67%) |
| Total | 4 | 6 |
Using a bar graph or line graph can make these relationships easier to spot:
Key rule: always calculate percentages or proportions by categories of the independent variable.
If one or both variables are interval-level, you can bin them in order to use them in a cross tab. For instance, you could separate an interval like into a series of age ranges.
Hypothesis: in a comparison of individuals, independents are less likely to turn out to vote compared to people who support one party or another.
How should I calculate proportions here?
Party ID
|
|||
|---|---|---|---|
| turnout2020 | Democrat | Independent | Republican |
| 0. Did not vote | 335 | 316 | 382 |
| 1. Voted | 3160 | 560 | 2714 |
Are these results generally consistent with my hypothesis?
Party ID
|
|||
|---|---|---|---|
| turnout2020 | Democrat | Independent | Republican |
| 0. Did not vote | 335 (10%) | 316 (36%) | 382 (12%) |
| 1. Voted | 3160 (90%) | 560 (64%) | 2714 (88%) |
If we think of party ID as an ordered variable, this is a curvilinear relationship.
What happens if I calculate % among the values of the DV?
Here’s the relationship between education and voter turnout with % calculated on education level:
Education
|
|||||
|---|---|---|---|---|---|
| turnout2020 | 1. Less than high school credential | 2. High school credential | 3. Some post-high school, no bachelor's degree | 4. Bachelor's degree | 5. Graduate degree |
| 0. Did not vote | 130 (41%) | 286 (24%) | 380 (15%) | 135 (7%) | 91 (6%) |
| 1. Voted | 185 (59%) | 883 (76%) | 2148 (85%) | 1749 (93%) | 1388 (94%) |
| Note: | |||||
| Column % in parentheses | |||||
The results suggest a positive or direct relationship: as education increases, so does the % turnout.
What happens if I calculate % among the values of the DV?
Here’s the relationship between education and voter turnout with % calculated across voter turnout
Education
|
|||||
|---|---|---|---|---|---|
| turnout2020 | 1. Less than high school credential | 2. High school credential | 3. Some post-high school, no bachelor's degree | 4. Bachelor's degree | 5. Graduate degree |
| 0. Did not vote | 130 (13%) | 286 (28%) | 380 (37%) | 135 (13%) | 91 (9%) |
| 1. Voted | 185 (3%) | 883 (14%) | 2148 (34%) | 1749 (28%) | 1388 (22%) |
| Note: | |||||
| Row % in parentheses | |||||
Here, the results can give the misleading impression that there’s a curvilinear relationship: turnout drops off for Bachelor’s Degrees and above.
Either of these tables might be a valid way to look at these data, but they answer slightly different questions:
If I want to compare turnout at different levels of education, then I need to calculate % turnout among people with different levels of education.
If I want to compare education among voters and non-voters, then I need to calculate % education among people who voted and didn’t vote.
Which variable is the IV or DV is sometimes a theoretical question, but in this case its unlikely that voting is causing people to become more educated, so it probably doesn’t make sense to calculate percentages by voting vs. non-voting.
When we have interval level outcome and a categorical independent variable, we can group each observation by values of the IV and then calculate the mean across each group.
For instance I want to examine the relationship between national wealth and carbon emissions. My hypothesis is that wealthier nations will have more emissions compared to poorer nations.
| country | gdp.percap.5cat | co2.percap |
|---|---|---|
| Afghanistan | 1. $3k or less | 0.281803 |
| Albania | 3. $10k to $25k | 1.936486 |
| Algeria | 3. $10k to $25k | 3.988271 |
| Angola | 2. $3k to $10k | 1.194668 |
| Argentina | 3. $10k to $25k | 3.995881 |
| Armenia | 3. $10k to $25k | 2.030401 |
| Australia | 5. $45k or more | 16.308205 |
| Austria | 5. $45k or more | 7.648816 |
| Azerbaijan | 3. $10k to $25k | 3.962984 |
| Bahrain | 5. $45k or more | 20.934996 |
GDP data has been grouped into five categories, so now I just need to calculate the average of CO2 emissions within each group of the ordinal IV:
| GDP Per capita range | CO2 emissions per capita |
|---|---|
| 1. $3k or less | 0.3128312 |
| 2. $3k to $10k | 1.2680574 |
| 3. $10k to $25k | 4.4065669 |
| 4. $25k to $45k | 8.0307610 |
| 5. $45k or more | 12.3134306 |
Is this generally consistent with expectations?
Here again, the relationship can be easier to conceptualize if we plot it.
A relationship like this will rarely be perfectly straight, so “linearity” and “curvilinearity” are partly a matter of degree, but there are some cases where there is a clear “U” shape to the relationship:
| iv | dv |
|---|---|
| 1. Extremely liberal | 6.314 |
| 2. Liberal | 5.685 |
| 3. Slightly liberal | 5.001 |
| 4. Moderate; middle of the road | 4.651 |
| 5. Slightly conservative | 4.636 |
| 6. Conservative | 4.974 |
| 7. Extremely conservative | 5.363 |